

<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />

  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  
  <title>Erasure Code developer notes &mdash; Ceph Documentation</title>
  

  
  <link rel="stylesheet" href="../../../../_static/ceph.css" type="text/css" />
  <link rel="stylesheet" href="../../../../_static/pygments.css" type="text/css" />
  <link rel="stylesheet" href="../../../../_static/pygments.css" type="text/css" />
  <link rel="stylesheet" href="../../../../_static/ceph.css" type="text/css" />
  <link rel="stylesheet" href="../../../../_static/graphviz.css" type="text/css" />
  <link rel="stylesheet" href="../../../../_static/css/custom.css" type="text/css" />

  
  

  
  

  

  
  <!--[if lt IE 9]>
    <script src="../../../../_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
    
      <script type="text/javascript" id="documentation_options" data-url_root="../../../../" src="../../../../_static/documentation_options.js"></script>
        <script src="../../../../_static/jquery.js"></script>
        <script src="../../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
        <script data-url_root="../../../../" id="documentation_options" src="../../../../_static/documentation_options.js"></script>
        <script src="../../../../_static/doctools.js"></script>
        <script src="../../../../_static/sphinx_highlight.js"></script>
    
    <script type="text/javascript" src="../../../../_static/js/theme.js"></script>

    
    <link rel="index" title="Index" href="../../../../genindex/" />
    <link rel="search" title="Search" href="../../../../search/" />
    <link rel="next" title="jerasure 插件" href="../jerasure/" />
    <link rel="prev" title="纠删码编码的归置组" href="../" /> 
</head>

<body class="wy-body-for-nav">

   
  <header class="top-bar">
    <div role="navigation" aria-label="Page navigation">
  <ul class="wy-breadcrumbs">
      <li><a href="../../../../" class="icon icon-home" aria-label="Home"></a></li>
          <li class="breadcrumb-item"><a href="../../../internals/">Ceph 内幕</a></li>
          <li class="breadcrumb-item"><a href="../../">OSD 开发者文档</a></li>
          <li class="breadcrumb-item"><a href="../">纠删码编码的归置组</a></li>
      <li class="breadcrumb-item active">Erasure Code developer notes</li>
      <li class="wy-breadcrumbs-aside">
            <a href="../../../../_sources/dev/osd_internals/erasure_coding/developer_notes.rst.txt" rel="nofollow"> View page source</a>
      </li>
  </ul>
  <hr/>
</div>
  </header>
  <div class="wy-grid-for-nav">
    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search"  style="background: #eee" >
          

          
            <a href="../../../../" class="icon icon-home"> Ceph
          

          
          </a>

          

          
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="../../../../search/" method="get">
    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>

          
        </div>

        
        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
          
            
            
              
            
            
              <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../../../start/">Ceph 简介</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../install/">安装 Ceph</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../cephadm/">Cephadm</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../rados/">Ceph 存储集群</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../cephfs/">Ceph 文件系统</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../rbd/">Ceph 块设备</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../radosgw/">Ceph 对象网关</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../mgr/">Ceph 管理器守护进程</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../mgr/dashboard/">Ceph 仪表盘</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../monitoring/">监控概览</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../api/">API 文档</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../architecture/">体系结构</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../developer_guide/">开发者指南</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="../../../internals/">Ceph 内幕</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../../../balancer-design/">Ceph 如何均衡（读写、容量）</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../blkin/">Tracing Ceph With LTTng</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../blkin/#tracing-ceph-with-blkin">Tracing Ceph With Blkin</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../bluestore/">BlueStore Internals</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../ceph_krb_auth/">如何配置好 Ceph Kerberos 认证的详细文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../cephfs-mirroring/">CephFS Mirroring</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../cephfs-reclaim/">CephFS Reclaim Interface</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../cephfs-snapshots/">CephFS 快照</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../cephx/">Cephx</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../cephx_protocol/">Cephx 认证协议详细阐述</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../config/">配置管理系统</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../config-key/">config-key layout</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../context/">CephContext</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../continuous-integration/">Continuous Integration Architecture</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../corpus/">资料库结构</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../cpu-profiler/">Oprofile 的安装</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../crush-msr/">CRUSH MSR (Multi-step Retry)</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../cxx/">C++17 and libstdc++ ABI</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../deduplication/">去重</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../delayed-delete/">CephFS delayed deletion</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../dev_cluster_deployment/">开发集群的部署</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../dev_cluster_deployment/#id5">在同一机器上部署多套开发集群</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../development-workflow/">开发流程</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../documenting/">为 Ceph 写作文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../dpdk/">Ceph messenger DPDKStack</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../encoding/">序列化（编码、解码）</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../erasure-coded-pool/">纠删码存储池</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../file-striping/">File striping</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../freebsd/">FreeBSD Implementation details</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../generatedocs/">Ceph 文档的构建</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../health-reports/">Health Reports</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../iana/">IANA 号</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../kclient/">Testing changes to the Linux Kernel CephFS driver</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../kclient/#step-one-build-the-kernel">Step One: build the kernel</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../kclient/#step-two-create-a-vm">Step Two: create a VM</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../kclient/#step-three-networking-the-vm">Step Three: Networking the VM</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../kubernetes/">Hacking on Ceph in Kubernetes with Rook</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../libcephfs_proxy/">Design of the libcephfs proxy</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../libs/">库体系结构</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../logging/">集群日志的用法</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../logs/">调试日志</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../macos/">在 MacOS 上构建</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../mempool_accounting/">What is a mempool?</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../mempool_accounting/#some-common-mempools-that-we-can-track">Some common mempools that we can track</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../messenger/">Messenger notes</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../mon-bootstrap/">Monitor bootstrap</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../mon-elections/">Monitor Elections</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../mon-on-disk-formats/">ON-DISK FORMAT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../mon-osdmap-prune/">FULL OSDMAP VERSION PRUNING</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../msgr2/">msgr2 协议（ msgr2.0 和 msgr2.1 ）</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../network-encoding/">Network Encoding</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../network-protocol/">网络协议</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../object-store/">对象存储架构概述</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../osd-class-path/">OSD class path issues</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../peering/">互联</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../perf/">Using perf</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../perf_counters/">性能计数器</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../perf_histograms/">Perf histograms</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../placement-group/">PG （归置组）说明</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../quick_guide/">开发者指南（快速）</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../rados-client-protocol/">RADOS 客户端协议</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../rbd-diff/">RBD 增量备份</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../rbd-export/">RBD Export &amp; Import</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../rbd-layering/">RBD Layering</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../release-checklists/">Release checklists</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../release-process/">Ceph Release Process</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../seastore/">SeaStore</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../sepia/">Sepia 社区测试实验室</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../session_authentication/">Session Authentication for the Cephx Protocol</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../testing/">测试笔记</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../versions/">Public OSD Version</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../vstart-ganesha/">NFS CephFS-RGW Developer Guide</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../wireshark/">Wireshark Dissector</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../zoned-storage/">Zoned Storage Support</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="../../">OSD 开发者文档</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="../../async_recovery/">异步恢复</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../backfill_reservation/">Backfill Reservation</a></li>
<li class="toctree-l3 current"><a class="reference internal" href="../">纠删码编码的归置组</a><ul class="current">
<li class="toctree-l4"><a class="reference internal" href="../#id2">术语</a></li>
<li class="toctree-l4 current"><a class="reference internal" href="../#id3">内容列表</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../../last_epoch_started/">last_epoch_started</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../log_based_pg/">Log Based PG</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../manifest/">Manifest</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../map_message_handling/">Map and PG Message handling</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../mclock_wpq_cmp_study/">QoS Study with mClock and WPQ Schedulers</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../osd_overview/">OSD</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../partial_object_recovery/">Partial Object Recovery</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../past_intervals/">OSDMap Trimming and PastIntervals</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../pg/">PG</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../pg_removal/">PG Removal</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../pgpool/">PGPool</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../recovery_reservation/">Recovery Reservation</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../refcount/">Refcount</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../scrub/">Scrub internals and diagnostics</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../snaps/">快照</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../stale_read/">Preventing Stale Reads</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../watch_notify/">关注通知</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../wbthrottle/">回写抑制</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../../mds_internals/">MDS 开发者文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../radosgw/">RADOS 网关开发者文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../ceph-volume/">ceph-volume 开发者文档</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../crimson/">Crimson developer documentation</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../../../governance/">项目管理</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../foundation/">Ceph 基金会</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../ceph-volume/">ceph-volume</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../releases/general/">Ceph 版本（总目录）</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../releases/">Ceph 版本（索引）</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../security/">Security</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../hardware-monitoring/">硬件监控</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../glossary/">Ceph 术语</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../jaegertracing/">Tracing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../../translation_cn/">中文版翻译资源</a></li>
</ul>

            
          
        </div>
        
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">

      
      <nav class="wy-nav-top" aria-label="top navigation">
        
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="../../../../">Ceph</a>
        
      </nav>


      <div class="wy-nav-content">
        
        <div class="rst-content">
        
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
            
<div id="dev-warning" class="admonition note">
  <p class="first admonition-title">Notice</p>
  <p class="last">This document is for a development version of Ceph.</p>
</div>
  <div id="docubetter" align="right" style="padding: 5px; font-weight: bold;">
    <a href="https://pad.ceph.com/p/Report_Documentation_Bugs">Report a Documentation Bug</a>
  </div>

  
  <section id="erasure-code-developer-notes">
<h1>Erasure Code developer notes<a class="headerlink" href="#erasure-code-developer-notes" title="Permalink to this heading"></a></h1>
<section id="introduction">
<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this heading"></a></h2>
<p>Each chapter of this document explains an aspect of the implementation
of the erasure code within Ceph. It is mostly based on examples being
explained to demonstrate how things work.</p>
</section>
<section id="reading-and-writing-encoded-chunks-from-and-to-osds">
<h2>Reading and writing encoded chunks from and to OSDs<a class="headerlink" href="#reading-and-writing-encoded-chunks-from-and-to-osds" title="Permalink to this heading"></a></h2>
<p>An erasure coded pool stores each object as K+M chunks. It is divided
into K data chunks and M coding chunks. The pool is configured to have
a size of K+M so that each chunk is stored in an OSD in the acting
set. The rank of the chunk is stored as an attribute of the object.</p>
<p>Let’s say an erasure coded pool is created to use five OSDs ( K+M =
5 ) and sustain the loss of two of them ( M = 2 ).</p>
<p>When the object <em>NYAN</em> containing <em>ABCDEFGHI</em> is written to it, the
erasure encoding function splits the content in three data chunks,
simply by dividing the content in three : the first contains <em>ABC</em>,
the second <em>DEF</em> and the last <em>GHI</em>. The content will be padded if the
content length is not a multiple of K. The function also creates two
coding chunks : the fourth with <em>YXY</em> and the fifth with <em>GQC</em>. Each
chunk is stored in an OSD in the acting set. The chunks are stored in
objects that have the same name ( <em>NYAN</em> ) but reside on different
OSDs. The order in which the chunks were created must be preserved and
is stored as an attribute of the object ( shard_t ), in addition to its
name. Chunk <em>1</em> contains <em>ABC</em> and is stored on <em>OSD5</em> while chunk <em>4</em>
contains <em>YXY</em> and is stored on <em>OSD3</em>.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>                          <span class="o">+-------------------+</span>
                     <span class="n">name</span> <span class="o">|</span>        <span class="n">NYAN</span>       <span class="o">|</span>
                          <span class="o">+-------------------+</span>
                  <span class="n">content</span> <span class="o">|</span>      <span class="n">ABCDEFGHI</span>    <span class="o">|</span>
                          <span class="o">+--------+----------+</span>
                                   <span class="o">|</span>
                                   <span class="o">|</span>
                                   <span class="n">v</span>
                            <span class="o">+------+------+</span>
            <span class="o">+---------------+</span> <span class="n">encode</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="o">+-----------+</span>
            <span class="o">|</span>               <span class="o">+--+--+---+---+</span>           <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">|</span>  <span class="o">|</span>   <span class="o">|</span>               <span class="o">|</span>
            <span class="o">|</span>          <span class="o">+-------+</span>  <span class="o">|</span>   <span class="o">+-----+</span>         <span class="o">|</span>
            <span class="o">|</span>          <span class="o">|</span>          <span class="o">|</span>         <span class="o">|</span>         <span class="o">|</span>
         <span class="o">+--</span><span class="n">v</span><span class="o">---+</span>   <span class="o">+--</span><span class="n">v</span><span class="o">---+</span>   <span class="o">+--</span><span class="n">v</span><span class="o">---+</span>  <span class="o">+--</span><span class="n">v</span><span class="o">---+</span>  <span class="o">+--</span><span class="n">v</span><span class="o">---+</span>
   <span class="n">name</span>  <span class="o">|</span> <span class="n">NYAN</span> <span class="o">|</span>   <span class="o">|</span> <span class="n">NYAN</span> <span class="o">|</span>   <span class="o">|</span> <span class="n">NYAN</span> <span class="o">|</span>  <span class="o">|</span> <span class="n">NYAN</span> <span class="o">|</span>  <span class="o">|</span> <span class="n">NYAN</span> <span class="o">|</span>
         <span class="o">+------+</span>   <span class="o">+------+</span>   <span class="o">+------+</span>  <span class="o">+------+</span>  <span class="o">+------+</span>
  <span class="n">shard</span>  <span class="o">|</span>  <span class="mi">1</span>   <span class="o">|</span>   <span class="o">|</span>  <span class="mi">2</span>   <span class="o">|</span>   <span class="o">|</span>  <span class="mi">3</span>   <span class="o">|</span>  <span class="o">|</span>  <span class="mi">4</span>   <span class="o">|</span>  <span class="o">|</span>  <span class="mi">5</span>   <span class="o">|</span>
         <span class="o">+------+</span>   <span class="o">+------+</span>   <span class="o">+------+</span>  <span class="o">+------+</span>  <span class="o">+------+</span>
<span class="n">content</span>  <span class="o">|</span> <span class="n">ABC</span>  <span class="o">|</span>   <span class="o">|</span> <span class="n">DEF</span>  <span class="o">|</span>   <span class="o">|</span> <span class="n">GHI</span>  <span class="o">|</span>  <span class="o">|</span> <span class="n">YXY</span>  <span class="o">|</span>  <span class="o">|</span> <span class="n">QGC</span>  <span class="o">|</span>
         <span class="o">+--+---+</span>   <span class="o">+--+---+</span>   <span class="o">+--+---+</span>  <span class="o">+--+---+</span>  <span class="o">+--+---+</span>
            <span class="o">|</span>          <span class="o">|</span>          <span class="o">|</span>         <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>          <span class="o">|</span>          <span class="o">|</span>         <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>          <span class="o">|</span>       <span class="o">+--+---+</span>     <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>          <span class="o">|</span>       <span class="o">|</span> <span class="n">OSD1</span> <span class="o">|</span>     <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>          <span class="o">|</span>       <span class="o">+------+</span>     <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>          <span class="o">|</span>       <span class="o">+------+</span>     <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>          <span class="o">+------&gt;|</span> <span class="n">OSD2</span> <span class="o">|</span>     <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">+------+</span>     <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">+------+</span>     <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">|</span> <span class="n">OSD3</span> <span class="o">|&lt;----+</span>         <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">+------+</span>               <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">+------+</span>               <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">|</span> <span class="n">OSD4</span> <span class="o">|&lt;--------------+</span>
            <span class="o">|</span>                  <span class="o">+------+</span>
            <span class="o">|</span>                  <span class="o">+------+</span>
            <span class="o">+-----------------&gt;|</span> <span class="n">OSD5</span> <span class="o">|</span>
                               <span class="o">+------+</span>
</pre></div>
</div>
<p>When the object <em>NYAN</em> is read from the erasure coded pool, the
decoding function reads three chunks : chunk <em>1</em> containing <em>ABC</em>,
chunk <em>3</em> containing <em>GHI</em> and chunk <em>4</em> containing <em>YXY</em> and rebuild
the original content of the object <em>ABCDEFGHI</em>. The decoding function
is informed that the chunks <em>2</em> and <em>5</em> are missing ( they are called
<em>erasures</em> ). The chunk <em>5</em> could not be read because the <em>OSD4</em> is
<em>out</em>.</p>
<p>The decoding function could be called as soon as three chunks are
read : <em>OSD2</em> was the slowest and its chunk does not need to be taken into
account. This optimization is not implemented in Firefly.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>                          <span class="o">+-------------------+</span>
                     <span class="n">name</span> <span class="o">|</span>        <span class="n">NYAN</span>       <span class="o">|</span>
                          <span class="o">+-------------------+</span>
                  <span class="n">content</span> <span class="o">|</span>      <span class="n">ABCDEFGHI</span>    <span class="o">|</span>
                          <span class="o">+--------+----------+</span>
                                   <span class="o">^</span>
                                   <span class="o">|</span>
                                   <span class="o">|</span>
                            <span class="o">+------+------+</span>
                            <span class="o">|</span> <span class="n">decode</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="o">|</span>
                            <span class="o">|</span> <span class="n">erasures</span> <span class="mi">2</span><span class="p">,</span><span class="mi">5</span><span class="o">|</span>
            <span class="o">+--------------&gt;|</span>             <span class="o">|</span>
            <span class="o">|</span>               <span class="o">+-------------+</span>
            <span class="o">|</span>                     <span class="o">^</span>   <span class="o">^</span>
            <span class="o">|</span>                     <span class="o">|</span>   <span class="o">+-----+</span>
            <span class="o">|</span>                     <span class="o">|</span>         <span class="o">|</span>
         <span class="o">+--+---+</span>   <span class="o">+------+</span>   <span class="o">+--+---+</span>  <span class="o">+--+---+</span>
   <span class="n">name</span>  <span class="o">|</span> <span class="n">NYAN</span> <span class="o">|</span>   <span class="o">|</span> <span class="n">NYAN</span> <span class="o">|</span>   <span class="o">|</span> <span class="n">NYAN</span> <span class="o">|</span>  <span class="o">|</span> <span class="n">NYAN</span> <span class="o">|</span>
         <span class="o">+------+</span>   <span class="o">+------+</span>   <span class="o">+------+</span>  <span class="o">+------+</span>
  <span class="n">shard</span>  <span class="o">|</span>  <span class="mi">1</span>   <span class="o">|</span>   <span class="o">|</span>  <span class="mi">2</span>   <span class="o">|</span>   <span class="o">|</span>  <span class="mi">3</span>   <span class="o">|</span>  <span class="o">|</span>  <span class="mi">4</span>   <span class="o">|</span>
         <span class="o">+------+</span>   <span class="o">+------+</span>   <span class="o">+------+</span>  <span class="o">+------+</span>
<span class="n">content</span>  <span class="o">|</span> <span class="n">ABC</span>  <span class="o">|</span>   <span class="o">|</span> <span class="n">DEF</span>  <span class="o">|</span>   <span class="o">|</span> <span class="n">GHI</span>  <span class="o">|</span>  <span class="o">|</span> <span class="n">YXY</span>  <span class="o">|</span>
         <span class="o">+--+---+</span>   <span class="o">+--+---+</span>   <span class="o">+--+---+</span>  <span class="o">+--+---+</span>
            <span class="o">^</span>          <span class="o">.</span>          <span class="o">^</span>         <span class="o">^</span>
            <span class="o">|</span>    <span class="n">TOO</span>   <span class="o">.</span>          <span class="o">|</span>         <span class="o">|</span>
            <span class="o">|</span>    <span class="n">SLOW</span>  <span class="o">.</span>       <span class="o">+--+---+</span>     <span class="o">|</span>
            <span class="o">|</span>          <span class="o">^</span>       <span class="o">|</span> <span class="n">OSD1</span> <span class="o">|</span>     <span class="o">|</span>
            <span class="o">|</span>          <span class="o">|</span>       <span class="o">+------+</span>     <span class="o">|</span>
            <span class="o">|</span>          <span class="o">|</span>       <span class="o">+------+</span>     <span class="o">|</span>
            <span class="o">|</span>          <span class="o">+-------|</span> <span class="n">OSD2</span> <span class="o">|</span>     <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">+------+</span>     <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">+------+</span>     <span class="o">|</span>
            <span class="o">|</span>                  <span class="o">|</span> <span class="n">OSD3</span> <span class="o">|-----+</span>
            <span class="o">|</span>                  <span class="o">+------+</span>
            <span class="o">|</span>                  <span class="o">+------+</span>
            <span class="o">|</span>                  <span class="o">|</span> <span class="n">OSD4</span> <span class="o">|</span> <span class="n">OUT</span>
            <span class="o">|</span>                  <span class="o">+------+</span>
            <span class="o">|</span>                  <span class="o">+------+</span>
            <span class="o">+------------------|</span> <span class="n">OSD5</span> <span class="o">|</span>
                               <span class="o">+------+</span>
</pre></div>
</div>
</section>
<section id="erasure-code-library">
<h2>Erasure code library<a class="headerlink" href="#erasure-code-library" title="Permalink to this heading"></a></h2>
<p>Using <a class="reference external" href="https://en.wikipedia.org/wiki/Reed_Solomon">Reed-Solomon</a>,
with parameters K+M, object O is encoded by dividing it into chunks O1,
O2, …  OM and computing coding chunks P1, P2, … PK. Any K chunks
out of the available K+M chunks can be used to obtain the original
object.  If data chunk O2 or coding chunk P2 are lost, they can be
repaired using any K chunks out of the K+M chunks. If more than M
chunks are lost, it is not possible to recover the object.</p>
<p>Reading the original content of object O can be a simple
concatenation of O1, O2, … OM, because the plugins are using
<a class="reference external" href="https://en.wikipedia.org/wiki/Systematic_code">systematic codes</a>. Otherwise the chunks
must be given to the erasure code library <em>decode</em> method to retrieve
the content of the object.</p>
<p>Performance depend on the parameters to the encoding functions and
is also influenced by the packet sizes used when calling the encoding
functions ( for Cauchy or Liberation for instance ): smaller packets
means more calls and more overhead.</p>
<p>Although Reed-Solomon is provided as a default, Ceph uses it via an
<a class="reference external" href="https://github.com/ceph/ceph/blob/v0.78/src/erasure-code/ErasureCodeInterface.h">abstract API</a> designed to
allow each pool to choose the plugin that implements it using
key=value pairs stored in an <a class="reference external" href="../../../erasure-coded-pool">erasure code profile</a>.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ceph osd erasure-code-profile set myprofile \
    crush-failure-domain=osd
$ ceph osd erasure-code-profile get myprofile
directory=/usr/lib/ceph/erasure-code
k=2
m=1
plugin=jerasure
technique=reed_sol_van
ruleset-failure-domain=osd
$ ceph osd pool create ecpool 12 12 erasure myprofile
</pre></div>
</div>
<p>The <em>plugin</em> is dynamically loaded from <em>directory</em>  and expected to
implement the <em>int __erasure_code_init(char *plugin_name, char *directory)</em> function
which is responsible for registering an object derived from <em>ErasureCodePlugin</em>
in the registry. The <a class="reference external" href="https://github.com/ceph/ceph/blob/v0.78/src/test/erasure-code/ErasureCodePluginExample.cc">ErasureCodePluginExample</a> plugin reads:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ErasureCodePluginRegistry</span> <span class="o">&amp;</span><span class="n">instance</span> <span class="o">=</span>
                           <span class="n">ErasureCodePluginRegistry</span><span class="p">::</span><span class="n">instance</span><span class="p">();</span>
<span class="n">instance</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">plugin_name</span><span class="p">,</span> <span class="n">new</span> <span class="n">ErasureCodePluginExample</span><span class="p">());</span>
</pre></div>
</div>
<p>The <em>ErasureCodePlugin</em> derived object must provide a factory method
from which the concrete implementation of the <em>ErasureCodeInterface</em>
object can be generated. The <a class="reference external" href="https://github.com/ceph/ceph/blob/v0.78/src/test/erasure-code/ErasureCodePluginExample.cc">ErasureCodePluginExample plugin</a> reads:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">virtual</span> <span class="nb">int</span> <span class="n">factory</span><span class="p">(</span><span class="n">const</span> <span class="nb">map</span><span class="o">&lt;</span><span class="n">std</span><span class="p">::</span><span class="n">string</span><span class="p">,</span><span class="n">std</span><span class="p">::</span><span class="n">string</span><span class="o">&gt;</span> <span class="o">&amp;</span><span class="n">parameters</span><span class="p">,</span>
                    <span class="n">ErasureCodeInterfaceRef</span> <span class="o">*</span><span class="n">erasure_code</span><span class="p">)</span> <span class="p">{</span>
  <span class="o">*</span><span class="n">erasure_code</span> <span class="o">=</span> <span class="n">ErasureCodeInterfaceRef</span><span class="p">(</span><span class="n">new</span> <span class="n">ErasureCodeExample</span><span class="p">(</span><span class="n">parameters</span><span class="p">));</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
</div>
<p>The <em>parameters</em> argument is the list of <em>key=value</em> pairs that were
set in the erasure code profile, before the pool was created.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">osd</span> <span class="n">erasure</span><span class="o">-</span><span class="n">code</span><span class="o">-</span><span class="n">profile</span> <span class="nb">set</span> <span class="n">myprofile</span> \
   <span class="n">directory</span><span class="o">=&lt;</span><span class="nb">dir</span><span class="o">&gt;</span>         \ <span class="c1"># mandatory</span>
   <span class="n">plugin</span><span class="o">=</span><span class="n">jerasure</span>         \ <span class="c1"># mandatory</span>
   <span class="n">m</span><span class="o">=</span><span class="mi">10</span>                    \ <span class="c1"># optional and plugin dependant</span>
   <span class="n">k</span><span class="o">=</span><span class="mi">3</span>                     \ <span class="c1"># optional and plugin dependant</span>
   <span class="n">technique</span><span class="o">=</span><span class="n">reed_sol_van</span>  \ <span class="c1"># optional and plugin dependant</span>
</pre></div>
</div>
</section>
<section id="notes">
<h2>Notes<a class="headerlink" href="#notes" title="Permalink to this heading"></a></h2>
<p>If the objects are large, it may be impractical to encode and decode
them in memory. However, when using <em>RBD</em> a 1TB device is divided in
many individual 4MB objects and <em>RGW</em> does the same.</p>
<p>Encoding and decoding is implemented in the OSD. Although it could be
implemented client side for read write, the OSD must be able to encode
and decode on its own when scrubbing.</p>
</section>
</section>



<div id="support-the-ceph-foundation" class="admonition note">
  <p class="first admonition-title">Brought to you by the Ceph Foundation</p>
  <p class="last">The Ceph Documentation is a community resource funded and hosted by the non-profit <a href="https://ceph.io/en/foundation/">Ceph Foundation</a>. If you would like to support this and our other efforts, please consider <a href="https://ceph.io/en/foundation/join/">joining now</a>.</p>
</div>


           </div>
           
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="../" class="btn btn-neutral float-left" title="纠删码编码的归置组" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="../jerasure/" class="btn btn-neutral float-right" title="jerasure 插件" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>

  <div role="contentinfo">
    <p>&#169; Copyright 2016, Ceph authors and contributors. Licensed under Creative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0).</p>
  </div>

   

</footer>
        </div>
      </div>

    </section>

  </div>
  

  <script type="text/javascript">
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script>

  
  
    
   

</body>
</html>